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!>• ! Abstract 

We study the entropy rate of pattern sequences of stochastic processes, and 
^ | | its relationship to the entropy rate of the original process. We give a complete 

HH characterization of this relationship for i.i.d. processes over arbitrary alphabets, 

& • stationary ergodic processes over discrete alphabets, and a broad family of 

stationary ergodic processes over uncountable alphabets. For cases where the 
| entropy rate of the pattern process is infinite, we characterize the possible 

^ ■ growth rate of the block entropy. 

: 

^ ! 1 Introduction 

o : 

In their recent work [H], Orlitsky et al. consider the compression of sequences with 
unknown alphabet size. This work, among others, has created interest in examining 
random processes with arbitrary alphabets which may a priori be unknown. One 
J> ! can think of this as a problem of reading a foreign language for the first time. As 

one begins to parse characters, one's knowledge of the alphabet grows. Since the 
characters in the alphabet have initially no meaning beyond the order in which they 
appear, one can relabel these characters by the order of their first appearance. Given 
a string, we refer to the relabeled string as the pattern associated with the original 
string. 

Example 1 Assume that the following English sentence was being parsed into a pat- 
tern by a non-English speaker. 

english is hard to learn. . . 

The associated pattern would be 

1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 8, 7, 9, 10, 11, 8, 12, 13, 8, 4, 1, 9, 10, 2, . . . 

regarding the space too as a character. 
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We abstract this as follows: given a stochastic process X = {Xj}j>i, we create a 
pattern process Z = {Zj}j>x. 

It is the compression of the pattern process {Zj} that is the focus of both [T] 
and [IT]. This emphasis is justified by the fact that the bulk of the information is in 
the pattern. Although universal compression is an extensively studied problem, the 
universal compression of pattern sequences is relatively new, see jHJ El EE HH EU 
Uni EH UB EZj • The majority of these recent papers address universality questions of 
how well a pattern sequence associated with an unknown source can be compressed 
relative to the case where this distribution is known. Emphasis is on quantifying 
the redundancy, i.e., the difference between what can be achieved with and without 
knowledge of the source distribution. The main question we focus on in this work is 
how the entropy rate of a sequence and that of its pattern relate. More specifically, 
our goal is to study the relationship between the entropy rate H(X) of the original 
process 1 {X,}, and the entropy rate H(Z) of the associated pattern process. This 
relationship is not always trivial, as the following examples illustrate. 

Example 2 Let Xi be drawn i.i.d. ~ P, where P is a pmf on a finite alphabet. Then 
we show below that H(X) = H(Z). 

The intuition behind this result is that given enough time, all the symbols with pos- 
itive probability will be seen, after which time the original process and its associated 
pattern sequence coincide, up to relabeling of the alphabet symbols. 

Example 3 Let X{ be drawn i.i.d. ~ uniform [0,1]. Then the entropy rate of {X{\ 
is oo. Since the probability of seeing the same value twice is zero, Zi — i w.p. 1 for 
all i and, consequently, H(Z) = 0. 

The connection between the entropy rate of the pattern and that of the original 
process was first studied for i.i.d. processes by Shamir and Song in [T7]. The results 
in [T7| give bounds on the block entropy of the pattern with respect to the block en- 
tropy of the original process. Such bounds naturally extend to bounds on the entropy 
rate. These bounds are improved upon in ^3JE3E5]- The work in |TH IT51 ITS! IT7] is 
primarily focused on finite block entropy. Although such results are extremely useful 
for gaining insight into the finite block entropy behavior, a question different from 
the one we present here, they do not completely characterize the relationship be- 
tween the entropy rate of an i.i.d. process and that of its associated pattern. The first 
complete characterization of this entropy rate relationship for the general i.i.d. case 
as well as Markov, noise-corrupted and finite alphabet stationary ergodic processes, 
is given in [3]. Orlitsky et al. in JU] independently derive the relationship for i.i.d. 
processes. The finite alphabet stationary ergodic result of |2] were later extended to 
general finite entropy discrete stationary ergodic processes in jlj, and independently 
for finite entropy discrete ergodic processes in 0. The uncountable alphabet i.i.d. 

throughout this work, will denote the sequence X m , X m+ i, . . . , X n . If not specified, m will 
be assumed to be 1. Furthermore, H(X) will denote entropy rate throughout this work, regardless 
of the discreteness of the distributions of {X n } (it should thus be regarded as oo when these are not 
discrete). 
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result of [3] were also extended to a family of uncountable alphabet processes with 
memory in j3] and [5]. The proof techniques used by Orlitsky et al. in [OJ QUI are 
significantly different than those used in |3J E] , the latter we present here. 

In this work we characterize the relationship between process and pattern entropy 
rates for general i.i.d., discrete Markov, and discrete stationary ergodic processes. 
Although the discrete Markov case falls under the more general results for discrete 
stationary ergodic sources, it will be shown that there is insight to be gained by 
exploring the discrete Markov case on its own. We then move on to examine sta- 
tionary ergodic processes, with memory, over uncountable alphabets. In particular, 
we consider the Markov and additive noise case. These two results are then used to 
show a more general result for a broad family of stationary ergodic processes over 
uncountable alphabets. Finally, for the case where the entropy rate of the pattern 
process is infinite, we examine the possible growth rates for the block entropy of 
pattern processes. 

In Section |21 we characterize the relationship between process and pattern entropy 
rate for the case of a generally distributed i.i.d. process. In Sectional we examine the 
discrete Markov and the general discrete stationary ergodic process. Furthermore, in 
SectionEJwe extend the uncountable alphabet results of Section|2]to certain processes 
with memory. In Section we characterize a set of achievable asymptotic growth 
rates for the block entropy of a pattern process. We conclude in Section |U] with a 
brief summary of our results. 



2 The I.I.D. Case 

Consider the case where X{ are generated i.i.d. ~ /, where / is an arbitrary distri- 
bution on the arbitrary source alphabet A. Let S = {x G A : PrjXi = x} > 0}. 



Theorem 1 Given i.i.d. ~ / and {Zi} its associated pattern process, for an 
arbitrary x a ^ S define the process 

£ r x { ifx^s 

1 \ x a otherwise. 

Then 

H(Z) = H(X) = H{X X ), 
regardless of the finiteness of both sides of the equality. 2 

Since we will make use of Corollary |U] in the proof of Theorem ^ we present the proof 
in Appendix |XJ It should be noted that Theorem [T] was independently discovered by 
Orlitsky et al. in [TJIj. As can be seen, Theorem is consistent with Example El and 
Example H3 Note that the process {X{\ is created by keeping all the point masses in 
S and assigning all the remaining probability to a new point mass. This corresponds 

2 Throughout this work, we use H to denote both entropy rate, when the argument is a process, 
and entropy, when the argument is a random variable. 
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with the result in Example El which suggests that the pattern of a process drawn 
according to a pdf has no randomness, i.e. an entropy rate of zero. Therefore, the 
only randomness in the pattern comes from the point masses and the event of falling 
on a "non-point-mass-mode". 

Example 4 Let {X{\ be an i.i.d. process with each component drawn, with probability 
1/3, asN(0,l) and, with probability 2 / '3 , as Bernoulli 1/2. Inthis caseXi is uniformly 
distributed on an alphabet of size 3. Therefore, Theorem^ gives H(Z) = log(3). 

Although \S\ < oo in all three examples above, it should be noted that Theorem ^ 
makes no such assumption. 

3 Discrete Alphabet Processes 

Having characterized the relationship between process and pattern entropy rate for 
the general i.i.d. process, what can be said about processes with memory? To begin 
exploring the answer to this question we examine one of the most basic stationary 
ergodic processes with memory, the Markov process. 

A Markov Processes over Discrete Alphabets 

Although discrete Markov processes fall under the more general Theorem El to follow, 
which deals with discrete stationary ergodic processes, there is insight gained by 
examining the Markov case on it own. In particular, we will see that the proof of the 
general discrete stationary ergodic result relies heavily on a version of the Shannon- 
McMillan-Breiman theorem for countably infinite alphabets, found in [2|, while no 
such heavy machinery is necessary for the simpler Markov case. This fact is due to 
the inherent structure of a Markov process and makes the Markov case an interesting 
example on it own. Later on in Section 01 we will also see it is this structure which 
makes the Markov process the first candidate for the extension of the uncountable 
alphabet results of Section El to uncountable alphabet processes with memory. 

The entropy rate of Markov processes is well-known. What can be said about the 
entropy rate of the associated pattern processes? We first look at the case of a first 
order Markov process with components in a countable alphabet. 

Proposition 1 Let {X{\ be a stationary ergodic first order Markov process on the 
countable alphabet A and let {Zi} be the associated pattern process. If H(K) < oo, 
then 

H(Z) = H(X). 

Proof of Proposition QJ- 

Let fx be the stationary distribution of the Markov process and let P x {y) = P(Xt+i = 
y\X t = x) for all x,y G A. The data processing inequality implies H(X n ) > H(Z n ) 
for all n. Hence 

H(X) > limsup-#(Z"). 

n— >oo Tl 
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To complete the proof it remains to show 



liminf -H{Z n ) > H(X) 

n^oo Ti 



for which we will need the following three lemmas. 
Lemma 1 If {x n }^ =l is a non-negative sequence, then 



1 - 

liminf — > Xi > liminf x n . 

n — *r*-i Ti f 4 n. — >nn 



n— »oo n 



Proof of Lemma QJ- 

Since the sequence {i„,}™ =1 is non-negative, then 

^ n 1 " 

- Xi > - Xi VneN. 

TI ' J Ti ' J 



n * — ' n 

1=1 i=lV^\ 



Therefore, 



lim inf — > Xj > lim inf — > 

n— >oo 71 ^— — ' n— >oo ^— — ' 

»=1 i=U/«J 



> liminf inf \xi : i > [y/n\ } 

= lim — j n f |^ . ^ > \ ^/n \ } 

= lim inf {x^ : i > [y/n\ } 
= liminf x n . 

n— >oo 



□ 



Lemma 2 Lei {^4 n } one? {-^n} ^ e two sequences of events such that lim^oo P(A n ) = 
1 and lim^oo P(B n ) = b. Then lim^oo P(A n n £? n ) = b. 

Proof of Lemma [H' 

P(A n n S n ) < P(-B n ) -> 6- On the other hand, 

lim inf P(A n n S n ) = lim inf 1 - P(A c n U B£) 

> liminf 1-PK)-P(B3 

n— »oo 

=1-0 -(1-6). 

□ 
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Lemma 3 Given any B C A such that \B\ < oo 

limmf-P(Z") > V/x(6)^($ B [P b ]), 

n— >oo » 

where we define the pmf [P&] (a;) = 1b(^)P)(^) + P)(P c )^x (^), /or an arbitrary 
x Q ^ P. 3 Pere 5^ is ■usea' to denote the distribution which places unit mass on x a . 

For an arbitrary distribution P on alphabet ^4 and B 6 $b[P] can be thought of 
as the distribution created by keeping distribution P on the set B and clumping the 
remaining probability P{P C } on a single new point mass. 

Proof of Lemma [3' 

Let A(X n ) be the set of distinct elements in {X\, . . . ,X n }. Then 

1 1 n 

liminf-P(Z n ) = liminf- > HtZAZ 1 ' 1 ) 

n—>oo Ti n^oo Ti 

i=l 

(a) 

> liminf H(Z n \Z n ~ l ) 

n— >oc 

(6) 

> liminf P(Z n |X n " 1 ) 

n^oo 

( = ) hminfP(Z n |X n _ 1 ,A(X"- 1 )) 

n— >oo 

iliminf P (^|X n _!, ^(X"- 1 ), 1 {B ca(x«-i)}) 

n— >oo — 

= Pr {P ^ ^(X"- 1 )} P (Z n |X n , ^(X"- 1 ), 1 {B ca(x-M} = 0) 
+ Pr {P C A(X n ~ l )}H (Z n \X n , A(X n - 1 ), 1 {B ca(x^ )} = l) 

> Pr {P C A{X n - l )}H (Z n \X n , A(X n - 1 ), 1 {BQA (X^)} = l) 

> ^ Pr {X n _! = 6, P C ^(X"- 1 )} P (Z n |X n = b, ^(X™- 1 ), 1 {B ca(x 

> ^ Pr {X n _! = b, B C A(r- ! ) } P(Z n |X„ = 6, ^(X^ 1 ) = P) 

= ]T Pr {X n _! = b, B C ^(X"- 1 )} P($ B [P 6 ]) 
fees 

(/) 

>^/i(6)P($ B [P]) 

where (a) comes from Lemma ^ (6) from the data processing inequality, and (c) 
from the fact that Markovity implies that Z n is independent of X n ~ 2 given X n _i 

throughout this work, given a distribution / and a set -B, 3>b[/] will denote the distribution 
defined by = ls(a;)/(x) + f(B c )5 Xo (x) for an arbitrary x ^ B. When / is a distribution, 

H(f) will denote the entropy of a random variable drawn according to /. Furthermore, 1a will 
denote the indicator function on the set A, while 1.4 will denote the indicator random variable on 
the event A. 
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and A(X n 1 ), (d) from the data processing inequality, (e) from a combination of 
Jensen's inequality and the data processing inequality, and (/) from Lemma 121 since, 
Pr{£ C A(X n )} -»• 1. □ 
Now let {-Bfc} be a sequence of sets such that Bk C .A, < oo for all k, and 

ton ^ -Kb)Pb{a)logP b {a) = ^ ^ -/i(6)P 6 (a) log P b {a), 
fees fe aeSfe ae.4 

regardless of the fmiteness of both sides of the equation. Note that since the above 
summands are all positive, such a sequence {Bk} can always be found. Lemma El 
gives 

liminf-iJ(Z n ) > J2 l*(!>)H($B h [Pb]) V k - 
beB k 

Hence, by taking k — > oo, we get 

limmf -H(Z n ) > lim V fi(b)H($ Bk [Pb]) 

n— >oo n k— >oo ^ — ' 

> lim V -P 6 (a) log P 6 (a) 

= 2 2 -»(b)Pb(a) log P 6 (a) 

b6B fc a6-B fc 



(a) 

ae.4 
FfXL 



£ £ - M (&)P 6 (a) log P 6 (a) 

(6) 



where (a) comes from the construction of {Bk} and (6) from the fact that {X,j} is 
a finite entropy first order Markov process. Note that (b) is not necessarily true for 
infinite entropy first order Markov processes. □ 
One should note that the proof of Proposition [T] can easily be extended to the case 
of Markov processes of any order. Hence, without going through the proof, we state 
the following: 

Theorem 2 Let {Xj} be a stationary ergodic Markov process of order m on the 
countable alphabet A, and let {Zi} be the associated pattern process. If H(K) < oo, 
then 

H(Z) = H(X). 



B Stationary Ergodic Processes over Discrete Alphabets 

Now that we have characterized the entropy rate relationship for the discrete Markov 
process, the natural next step would be to extend the results to all stationary ergodic 
processes on a countable alphabet. 
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Theorem 3 Let {X{\ be a stationary ergodic process with components taking values 
from the countable alphabet A, and assume H(X) < oo. Let {Z{\ be the associated 
pattern process. Then 

H(Z) = H(X). 

We will see that as compared to the proof of Proposition [TJ the structure of the proof 
of Theorem [3] is slightly different, using a sandwich argument, and making use of 
heavier machinery such as a version of the Shannon-McMillan-Breiman theorem for 
countably infinite alphabets |2j. 

It is also important to note that like TheoremEl Theorem[3]also has a finite entropy 
constraint. The need to exclude processes with infinite entropy from TheoremEl is a 
direct result of the requirement of finite entropy for the countably infinite version of 
the Shannon-McMillan-Breiman theorem. 

The proof of Theorem El will use the following two claims. 

Claim 1 Let (Zq , . . . , Z_£) denote the pattern of the sequence (Xq, . . . , X- n ). 

lim H{zl n) \XZ 1 n ) = H{X Q \Xzl). 

Proof of Claim 

It is sufficient to show 



lim H{X Q \XZl) = H{X Q \XZI) (1) 

and 

lim \H(X \XZ^)-H(Z^\XZ 1 n )\=0. (2) 

n— »oo 

From :2] we know that -]og(P(X Q \Xl£)) -> - log(P(X |Xl^)) a.s. and the 
sequence is uniformly integrable, implying ([TJ. 

Moving on to © we see that the data processing inequality gives us H(X \XZn) > 
H(ZQ n ^\XZn) fo r all n. Hence it will suffice to show 

]imswpH(X \XZn) - H(Z^ n) \XZ^) < 0. 
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Let A (X_ n ) be the set of distinct elements in {X-i, . . . , X_ n }. Then 



Ml v-l\ 



H(X \XZ^)-H(ZnX 



--E 



» log 



.xe.4 



E 



E 



E p x \x: 

xeA(xzl) 



(x) log 



P x \xz 1 S x ) 



=E 



E ^Wri 1 



x) log 



1 



p 



E 



(A (XT^) log 



Px \x-_i {A{XZIY) 



<E 



E p x \ X -jS x ) lQ g ( p i 1 lfjC J 



(3) 



Since H(X ) < oo, given e > there exists a. B C A such that: |P| < oo, ii b E B 
then Pr{X = 6} > 0, and 

#($ B e[P x ]) < e, (4) 
where Px is the distribution on Xq and $bc[Px] is defined as in Lemma El Since 



E 



E P x \x-_'S X ) 1o S 



xeB c 



P x \xz^ x ) 



< H(^ B 4P x ]\XZ r [) < P($ B c[P x ]) Vn, 



(HJ) implies 



E 



,x£B c 

By the ergodicity of {Xi} 



P 



*x \xz±\ x ) 



< e Vn. 



(5) 



lim Pr{P C A (XZ^)} = 1. 



(6) 
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From (J2J) and the construction of B we get 



H{Xo\XZl)-H{Z^\Xzl) <E 



X 



+ E 



1 



{^(xz*)} zJ p x |x: 



(x) log 



p 



*o|*Zi 



(x) 



<E 



+ 



<E 



1 



{BcA(xi 1 n )} 1 X \X\ 

x£B> 



-l fx) lo£ 



P x Q \xzlS x ) 



H(X \X_^)E [l^^-i)} 



x |x: 



(x) 



+ #(X)E 



L 1 {B^(^ri)} 



(a) 

<eE 



H(X )E 



-{B£A(XZI)}, 



1 {BcA(xzi)} 

<ePr{B C A (XT*)} + Pf(X ) (l - Pr{P C A (X^)}) 
where (a) follows from (jHJ). Taking the limit in n, (jOJ) gives 

limsupi/(Xo|X^) - H{Z^\XZl) < e. 

Since e was arbitrary, (j2J) follows, completing the proof of Claim [T] 
Claim 2 



□ 



#(X) = H(X \X. 



-i \ 

oc 



Proof of Claim [H 

From |2] we know that - \og(P(X \Xll)) -> - log(P(X |Xl^)) a.s. and the sequence 
is uniformly integrable. Therefore, uniform integrability and almost sure convergence 
implies convergence in mean. □ 
We are now ready for: 

Proof of Theorem^- 



H(X) =limsup -H(X n ) 

n^oo Th 
(«) Irs 

>limsup -H(Z n ) 
>liminf -H(Z n ) 

n— >oo n 
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n— 1 

Hminf- V/jrZi+il^ 



j=0 



(&) 



>liminf #(Z n+1 |Z n ) 



ihminfF^ 

n— +00 V 

>liminf H(Z^ n) \X. 



(d) 



7(1) 



-1> 



(<0 



where (a) follows from the data processing inequality, (b) from Lemma ^ (c) is a 
result of stationarity, (d) results from Claim ^ and (e) results from Claim |21 As a 

reminder, we use (Zq, . . . , Z_£) to denote the pattern of the sequence (X , . . . , X_ n ). 

□ 



4 Uncountable Alphabet Processes with Memory 

The i.i.d. results of Theorem ^ completely characterize the entropy rate relationship 
for the general memoryless stationary process. So far, we have only addressed the 
case of discrete processes with memory. A natural question that arises is whether the 
relationship between the entropy rate of the process and that of the pattern shown in 
Theorem^can be extended to processes with memory over an uncountable alphabet? 

Besides helping to answer the question of how far we can extend the i.i.d. results 
of Theorem d the study of the uncountable alphabet setting is also motivated by 
real world processes such as discrete signals which are jittered. Any discrete process 
corrupted by Gaussian noise can be thought of as an example of such jittered pro- 
cesses. Although the motivation of lossless compression is not as applicable in the 
uncountable alphabet setting, patterns may still be useful. In general, focusing on 
the pattern allows us to map our process into a finite alphabet process. Although 
information is lost in the mapping, the pattern may still capture relevant information 
and therefore prove to be useful in certain applications such as lossy compression. 

Furthermore, the study of continuous alphabets allows us to look at the effect of 
densities on the entropy relationship. Although densities are strictly a property of 
continuous alphabets, they can be used to better understand the finite block behavior 
of the entropy relationship in the discrete setting. In particular, when looking at a 
finite block length n, it is possible for a discrete process to have a subset of the support 
which has large measure, but whose elements each have measure much smaller than 
1/n. Taking the limit in n, no such set can exist for discrete processes, but for finite 
n such a set acts like an effective density and affects the entropy relationship for 
finite blocks. An example of the role of such an effective density can be found in [TB] 
where bounds on the finite block entropy of patterns generated by i.i.d. processes 
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are developed. In ^H], Shamir concludes the paper with the observation that low 
probability symbols contribute to the pattern entropy mostly as a single super-symbol, 
which is exactly how Theorem ^ describes the contribution of the density part of a 
distribution to the entropy rate of patterns generated by i.i.d. processes. Hence 
the study of the continuous alphabet setting may not only extend the limit results 
of Theorem d but also give insight into the finite block behavior of the entropy 
relation for the general discrete setting. With this motivation in mind, we begin our 
examination of the entropy rate relationship in the uncountable alphabet setting by 
first looking at Markov processes. 



A Markov Processes over Uncountable Alphabets 

We observed in Section |H] that the inherent structure of the Markov process simplified 
the proof of the results in the discrete case. The hope is that by looking at this 
heavily structured family first we will develop some insight into the more general case 
of a stationary ergodic process over an uncountable alphabet. 

Although we are unable to characterize the entropy rate of the induced pattern 
process for a general uncountable alphabet Markov process, the following proposition 
covers a fairly general family of Markov processes. Before we state the proposition, let 
us generalize some of the notation used in Proposition ^ Given an m th order Markov 
process {X{\ on R, for x m G W 71 let f x m be the kernel associated with the state x m . 
We will denote the set of point masses of f x 
y\X m = x m }>0}. 

Proposition 2 Let {X{} be a stationary ergodic Markov process on M. of order m 
such that there exists ScK with S x ™ = S for all x m G M. m and &s[fx m ] — &s[fy m ] 
for all x m ,y m G" S m . Let {Zi} be the pattern process associated with {Xi}. Define the 
process {Xi} as 

^ f Xi ifX.eS 

\ x Q otherwise 
for an arbitrary x Q G S c . If \S\ < oo, then 

H(Z) = H(X). 

The proof of Proposition EJ as well as the remaining results of the present section 
begins with the observation that Theorem El implies H(X) = H(Z), where {Zi} is 
the pattern process associated with {Xi}. It is then left to show that H(Z) is equal 
to H(Z). To this end, we show that for any given n, the difference between H(Z n ) 
and H(Z n ) is either bounded or grows sub-linearly in n. 

Proof of Proposition 

If l^l = 0, w.p. 1 the process {X{} does not repeat and therefore H(Z) = 0. Sim- 
ilarly if \S\ = 0, the process {Xi} is a constant and therefore H(X) = 0. Hence 
H(Z) = H(X). 
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We now look at the case where \S\ > 0. We observe that since $s[/ x m ] = $s[fy m ] 
for all x m ,y m G" S m , {Xi} is a discrete Markov process of order m. Hence Theorem |2] 
and the fact that {Xi} is stationary ergodic and has finite entropy gives 

H(X) = H(Z). (7) 

For x G S define the waiting time I x = inf{i > : Xi = x}. Given {I x }xeS we 
know the first appearance of every point in S. Hence we know the first appearance 
of every point but those which are assigned zero probability by every kernel, i.e. all 
but those that appear at most once w.p. 1. Therefore given Z n and {I x }xeS we can 
reconstruct Z n w.p. 1 for all n. Similarly given Z n and {I x }xes we can reconstruct 
Z n for all n. Hence 

H(Z n )<H(Z n ) + Y,H(I x ) Vn (8) 

xes 

and 

H{Z n ) + Y,H{I x )>H{Z n ) Vn. (9) 



Claim 3 

H(I X ) < oo WxeS 

Proof of Claim [3J- 

Given x G S, define d min = min{Pr{X m+1 = x\X m = y m } : y m G W m } and d max = 
max{Pr{X m+1 = x\X m = y m } : y m G R m }. Note that since < \S\ < oo and 
®s[fx m ] = &s[fy m ] f° r ah x m ,y m G" S m , rf min and <i max are well defined. By the 
definition of S, d min > 0. 

First, let us consider the case where <i max = 1. Then, there exists a x G S and 
y m e Rm guch that p r {x m+1 = x\X' m = y m } = 1 and S = S y m = {x}. We will first 
look at the case where y m G S m . Since S = {x}, if y m G S m , then y m = x m , where 
x m is the vector (x, . . . ,x) of length m. Therefore f x m = 5 X and once the state x m 
is reached it cannot be exited. Hence in order for {X^} to be irreducible, which is 
required for the process to be ergodic, it must place zero or unit probability on being 
in state x m . By the construction of S and the fact that x G S, Pr{X m = x m } > 
and therefore Pr{X m = x m } > 0. Hence Xi = x w.p. 1 and H(I X ) = 0. Let us now 
examine the case where y m G" S m . Since $s[fu m ] = &s[fy m ] for all u m ,y m G" S m , if 
u m G" S m , then Pr{X m+1 = x\X m = u m } = 1. Noting that S = {x}, we can conclude 
that if Xi x, then w.p. 1 X i+ i = x. Hence I x < 2 and H(I X ) < log 2. 

We now consider the less trivial case where <i max < 1. Since regardless of the state 
y m , Pr{X m+1 = x\X m = y m } G [d min , d max ], then 

Pr{I x = i} G [d min (l - d max ) i_1 , rf max (l - d min y- 1 } Vz G N. 

Hence 



^)=E p *=oiog(f^-j) 
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y^minV- 1 - "maxj / 

oo 

= - dmax ^(1 - cUn) 1 " 1 [log(d min ) + log((l - d max )' _1 )] 
i=l 

oo 

= - dmax 5^(1 - 4lin) 1-1 log(d min ) 
i=l 

oo 

- (imax^(j - 1)(1 - (imin)*" 1 log(l - d max ) 
i=l 

oo 

= - dmax log(d min ) y~](l ~ d min ) 1 



i=0 

oo 



dmax log(l - d max ) )j(l — d 



i=0 



^maxlog(rfmin) dmax(l - ^min) log(l ~ d D 



'2 

miri 



(10) 



Since d min , d max G (0, 1), equation (fTUj) implies H(I X ) < oo. □ 
Claim 01 therefore gives 

lim = ViG5. (11) 

n^oo Tl 

Combining equations (jHJ), ©, (fTTJl and noting that IS*! < oo gives H(Z) = H(Z). 
Equation then completes the proof. □ 

Example 5 Let {Xi} be a first order Markov process on [0, 1] with the following 
transition kernels, represented as generalized densities on [0,1]: 

/o(y) = |*o(v) + ^(y) 

h{y) = - A 8 {y) + U x {y) + ~ 

and /or a; G (0, 1) 

113 

=^ (y) + -8i{y) + -l{( a; -i/2)( 2/ -i/2)>o}(2/) 

1 

+ ^l{(x-V2)(»-l/2)<0}(3/)- 

It zs readily checked that the stationary distribution given the above kernels is 

H(x) = -5 Q (x) + -Sx{x) + - Vx G [0, 1]. 
z d o 
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In the above case, {Xi} can be thought of as a first order Markov process on the 
set {0,1/2,1}, (the value 1/2 chosen arbitrarily) with transition probabilities whose 
generalized densities are 

My) = l 5 o(y) + ^i(y) 
My) = \Hv) + \Hv) + \sxMv) 
fiMv) = i 6 o(v) + i s i(y) + ^My)- 

Hence {X{\ has the following stationary distribution 

111 

Applying Proposition^ gives 



H(Z) = H(X) 



-HiXtlXx = 0) + ^H{X 2 \Xt = 1) + ^HiXzlXx = 1/2) 



2 v 4°y 3V2/ 6\2 
= I-|log(3) = 1.1556. 

B Additive White Noise-Corrupted Processes 

We now consider the case of a noise-corrupted process. Let {Xi} be a stationary 
ergodic process and {Yj} be its noise-corrupted version. Here we assume i.i.d. additive 
noise, {N}, with Xi, Y iy and N taking values in R. Let Sy and SV denote the set of 
point masses for Yi and iVj respectively. We will also define the process 



N 



Ni if Ni e S N 
n Q — Xi otherwise 



for an arbitrary n Sy- 



Proposition 3 Let {Xi} be a finite alphabet stationary ergodic process. Let {Yi} 
and {Yj} denote the process {Xi} corrupted by the additive noise {N}, and {N}, 
respectively. Further let {Z{} denote the pattern process associated with {Yi}. If 
\Sn\ < oo, then 

H(Z) = H(Y). 

It is interesting to note that the result of Proposition El can be rephrased to look 
more like those of Theorem ^ and Proposition |21 This is accomplished by observing 
that the process {%}, used in Proposition El can also be constructed by 



Yi 



Yi if Yi e S Y 
n otherwise 
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for an arbitrary n Q ^ Sy. This is the construction of {X{\ used in both Theorem^ 
and Proposition El 

Proof of Proposition 0- 

If P(Ni G Sn) = 1 then Ni = N and there is nothing to prove, so we will assume that 
P(Ni G S^) < 1. Let {Zi} denote the pattern process associated with the process 
{Yj,}- Since {Yi} is a discrete stationary ergodic process with finite entropy, Theorem|3] 
gives 

H(Z) = H(Y). 

Hence to complete the proof of Proposition 01 we just need to show that 

H(Z) = H{Z). 

Define 



Z(n)i 



Zi if 3 j G [l,n]\z s.t. Zi = Zj 
y Q otherwise 



for some arbitrary non-integer y . Clearly Z(n) uniquely determines Z n and vice 
versa, so in particular, 

H{Z n ) = H(Z{n)) Vn>0. (12) 

Define 

I no = > : Ni G S C N }. 

We also observe the following: if Zi ^ Zi no then Yi = Yi and if Zi = Zj no then w.p. 1 
Yi ^ Yj for all j ^ i. Hence we can construct Z(n) from Z n and I Uo w.p. 1. Therefore 

H(Z n ,I no )>H(Z(n)) Vn>0 

and consequently, 

H(Z n ) + H(I no ) > H(Z(n)) Vn>0. (13) 

Since I no is the waiting time for the first appearance of an element from S%j in the 
i.i.d. process {N}, it is geometrically distributed, and in particular has finite entropy. 
Therefore 

n—>oo n 

which combined with (fT2"j) and (jlHj) gives 

H(Z) > H(Z). (14) 

Defining C{n)% = 1 {{z{n) i = yo }r\{Y i es Y }} 1 we ma ^ e ^ n e following observations: w.p. 
1 Yi = Yi if and only if C(n)i = 1 or Z(n)i ^ y a and Yi = n if and only if C{n)i = 
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and Z(n)i = y Q . From these observations we conclude that given C{n) and Z(n) we 
can reconstruct Z n w.p. 1 for all n > 0. Hence, for all n > 0, 

H(Z n ) <H(Z(n),C(n)) 

<H(Z(n)) + H(C(n)) 

{ ^H(Z n ) + H(C(n)) 

n 

<H{Z n ) + Y. H ( C ^' ( 15 ) 
i=i 

where (a) comes from ([12)1 . 
Let 

Pe 4 (n) = Pr{r, G S Y , Yj ^1- V j G [l,n]\i}. 

Then 

Pet ] =Pr{r, G 5 y }Pr{y, ^ Y, V j G [l,n]\z \Y t G M 

= Pr{K 4 G } ^ Pr{^ ^ y V j G [1, n]\i \Y = y} Pr{Y t = y \Y t G S Y } 

y es Y 

< Pr{Y g S Y } ^Vje[l, n] V |y; = y}- 

Without loss of generality assume that i < n/2 

Pe\ n) < Pr{Y G S Y } P*{Yj + V V j e [i + 1, i + n/2 - 1] % = y} 

yeS Y 

<Pt{Y! G S y } £ Pr{^ ^VjG [2,n/2]|y a = y}, 

where (a) follows from the stationarity of Y. Let 

Pe (n) = p r{Fi e 5y} ^ p r{1 ,. ^ Vj G [2, n/2] |Yi = y}. (16) 

yesy 

Therefore we have 

Pe (n) > Pe 4 (n) Vz. (17) 

By ergodicity we have 

lim Pr{Y} ^ y V j G [2, n/2] | ^ = y} = Vy G 5 y 

n— >oo 

and since \S Y \ < oo, (fTB^) gives us lim^oo Pe^ = 0. Hence there exists an N such 
that Pe (n) < 1/2 for all n > N and (JT7J) implies that 

H B {Pet ] ) < P B (Pe (n) ) Vn > AT, (18) 
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where Hb is the binary entropy function. Substituting Pe^f 1 into (fT5|) and noting 
that E (C(n)i) = Pef ] gives 



H(Z r 



n n n 

i=l 

which combined with (|18|) gives us 

n n 

Since linage Pe (n) = 0, 

H{Z) < H(Z). (19) 
Combining (fTlj) and completes the proof. □ 

Note that in the case where {iVj} is a discrete i.i.d. process, Proposition 01 agrees 
with Theorem 01 In the case where has no point masses then, as Example H3 would 
suggest, Proposition El gives H(Z) = 0. 

Having verified that Proposition El is in agreement with previous results, let us 
examine a case where previous theorems do not apply. 

Example 6 Let {X{\ be a first order Markov process on the set {1,2} and let {Ni} 
be i.i.d., independent of {Xi}, distributed according to the density 

^(m) = ^e- m2 / 2 + ^ (m), 

Vo7T ^ 

where Sq denotes a unit mass on 0. Further let Y{ = Xj + iVj and {Z/\ be its associated 
pattern process. Since {Y{\ is a hidden Markov process with memory on a continuous 
alphabet, previous results fail to capture the relationship between H(Y) and H(Z). 
However, Proposition^ gives 

H(Z) = H(Y), (20) 

where Yi is the ternary hidden Markov process given by Xi with probability 1/2 and 
an arbitrary n a $ {1,2} with probability 1/2. We can also use Proposition^ to lower 
bound H(Z) in terms of H(X). Noting that {Y^} is simply {Xi} with erasures, we let 
Ii denote the event of erasure at time i. Then 

H(Y n \Y n ~ 1 ) ^=H{Y n ,I n \Y n ~ l ) 

=H(I n ) + H(Y n \Y n ~ 1 ,I n ) 

=H(I n ) + Pr{I n = 0}H(X n \Y n -\ I n = 0) + 



>H{I n ) + Pr{/ n = 0}H(X n \X n -\ I n = 0) 
^l+ l -H{X n \X n - 1 ) (21) 
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where (a) follows from the fact that given Yi we know Ii, (b) from the fact that given 
Ii = 1, Yi is a constant and given Jj = 0, Yi = Xi, (c) from a combination of the fact 
that X n is independent of I n and that conditioning decreases entropy, and finally, (d) 
follows from the fact that {Ii} is an i.i.d. Bernoulli 1/2 process, independent of the 
process {Xi}. Combining H?J) and {Hp we get 

H(Z) > l -H{X) + 1. (22) 

Note that $2j) holds with equality when {Xi} is i.i.d., as is readily seen to be implied 
by Theorem^ 

C Stationary Ergodic Processes over Uncountable Alphabets 

Through the results of Proposition |21 and Proposition |3] we have seen two separate 
families of processes with memory on uncountable alphabets that share similar en- 
tropy rate properties. However, we are not able to extend such a relationship to 
the general stationary ergodic process. An interesting question that arises is what 
characteristics do the Markov processes of Proposition |2] and the additive noise pro- 
cesses of Proposition |3] share that allow for this characterization of the relationship 
between process and pattern entropy rates? In order to help answer this question, 
we examine the following Markov example which does not satisfy the requirements of 
Proposition |21 

Example 7 Let {X{\ be a first order Markov process on [0, 1] with a uniform station- 
ary distribution. Furthermore, conditioned on Xi, Xi + \ = Xi with probability 1/2 and 
Xi + i is drawn uniformly on [0, 1] with probability 1/2. It is easy to see that {Xi} does 
not satisfy the conditions of Proposition^ In this case, S = {x G [0, 1] : Pr{Xi = 
x} > 0} = and therefore the sequence {Xi} is constant and 

H(X) = 0. 

We also observe that at any time i + 1 we either see a new symbol with probability 
1/2 or we repeat Xi with probability 1/2. Therefore, 

H(Z) = 1, 

not H(X) as would be assumed from the relationship between pattern and process 
entropy rates found in Proposition and Proposition Hence, unlike the processes 
described in Proposition^ ® n d Proposition^ we see that 

H(Z) + H(X). 

Example [7| suggests that one of the important characteristics shared by the pro- 
cesses in Proposition El and Proposition El which allow for the equality between H(Z) 
and H(X), is the control over the repeatability of density points. In other words, 
assuring that for the most part only elements in 5* are likely to be seen more than 
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once. This characteristic is also demonstrated by the i.i.d. processes of Theorem 
which share the equality between H(Z) and H(X). 

With this in mind we can try to extend this characteristic to general stationary 
ergodic processes in hopes of developing a similar entropy rate relation. Before we 
state the next theorem, let us define some notation and make rigorous the criterion 
of repeatability described above. Given a stationary ergodic process {X{\ on R, let 
S = {x £ R : Pt{X 1 = x} > 0} and R = {x £ R : Pr{3j > 2 : X s = X 1 \X 1 = 
x} > 0}. Let S = {si, s 2 , • • • , s\s\} and Pj = PrjXx = Sj}. Without loss of generality 
we will assume that the elements of S are ordered such that Pi > Pi > . . . . 

Theorem 4 Let {Xi} be a stationary ergodic process on R with Pr{X! £ i?} = 
PrjXi £ S}. Let {Zi} be the associated pattern process. Define the process 



X { ifXieS 
x Q otherwise 



for some arbitrary x Q £" S . If \S\ < oo, then 

H(Z) = H(X). 

Otherwise, if \S\ is infinite and there exists (3 > 2 such that 

Pi 

lim j-^ = 0, 

then 

H(Z) = H(X). 

It should be to noted that both Proposition^ and Proposition El are special cases 
of Theorem HJ 

The requirement PrjXx £ R} = Py{Xi £ S}, is the mathematical equivalent of 
the statement that only elements in S are likely to be seen more than once. While 
the /3-convergence requirement is a technicality needed in the proof, it may prove to 
be non-essential. 

Hence we see that controlling repeatability of density points is, essentially, a suf- 
ficient condition for establishing equality between H(Z) and H(X). Furthermore, 
Example [7| suggests that it is a necessary condition. Hence, the /3-convergence re- 
quirement aside, there is reason to believe that Theorem 0] in some sense describes the 
largest family of stationary ergodic processes over uncountable alphabets for which 
the equality between H(Z) and H(X) holds. In particular, the /3-convergence con- 
dition aside, Theorem 0] contains as special cases the i.i.d. results of Theorem Q the 
discrete setting results of Theorem El the Markov results of Proposition |21 and the 
noise-corrupted process results of Proposition El 

The proof of Theorem 0] begins with the observation that Theorem El can be used 
to show that H(X) = H(Z). We are then left to show that H(Z) = H(Z). This 
is done in a two step process. We first show that by making us of the information 
contained in the indexes of first appearance for a finite set B C S, we can bound the 
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difference between H(Z n ) and H(Z n ). This bound is a function of Pr-fXi G B}, \B\, 
and n and is true for all n and B C S. Finally, the limit condition on {Pi}^ allows 
us to pick a sequence {B n }^ =l for which the upper bound on the difference between 
H(Z n ) and H(Z n ) grows sub-linearly in n, completing the proof of Theorem 0] 

Proof of Theorem^ 

We can assume that Pr{Xx G S} < 1, otherwise there is nothing to prove. If \S\ < oo, 
then H(Xi) < oo. In the case where \S\ is infinite, then the fact that (3 > 2 and 

lim j-^rg = 0, 

implies that H(Xi) < oo. Since {X^} is a discrete stationary ergodic process with 
finite entropy, Theorem |H] gives 

H(X)=H(Z), (23) 

where {Zi} is the associated pattern process. To complete the proof of Theorem 0J 
we need to show that 

H(Z) = H(Z). 

For x G 5* define I x = inf{z > : Xi = x} and 1^ = I x l{i x < n } — i{i x >n}- Hence 
1^ has an alphabet of size n + 1 and therefore 

H(I^) < log(n + l). (24) 
Given B C S such that \B\ < oo, let P B = Pr{Xi G S n B c } and C^ n) = 

l{Xi,X 2 ,...,X n ^5nB c }- 

If Cj 3 n) = 1 then given {I x n ^} x( zB, we know all the labels of the elements of S which 
appear in X n . Hence for n > 1 and conditioned on = 1, given Z" and {/i n) } 
we can reconstruct Z n . Therefore 

Pr{4 n) = l}ff(Z"|4 n) = 1) <Pr{4 n) = l}H(Z n ,{I^} xeB \CP = 1) 

<Pr{4 n) = l}^,^}^^ = 1) 
<Pr{C^ n) = l^^lC^ = 1) 

+ Pr{c4" ) = l}ff({/W}. 6B |4 B) = 1) 
<Pr{C^ n) = l}F(Z n |C^ = 1) 

+ Pr{C^ n) = 0}if(Z n |^ n) = 0) 

+Pr{4 n) = iw{4 n) wi4 n) = i) 

+ Pr{4" ) = 0}if({/W} :l . GB |4" ) = 0) 
=H(Z n \C n B )+H({I^} x£B \C n B ) 
<H(Z n )+H({I^} xeB ) 

<H(Z n ) + \B\ log(n + 1) > 1. (25) 
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Given i, we now wish to examine the probability 

Pr{3j >i:Xj = X t \X t £ S} ( = } Pr{3j >2:Xj = X X \X X G* S} 

< Pr{3j >2:Xj = X X \X X <£ R) 

oo 

<J2^{X J =X 1 \X 1 ^R}, (26) 

where (a) follows from the stationarity of the process {X} and (6) from the fact that 
S C i2 and Pr{Xi G i?} = Pr{X x G 5} implies that Pr{X x G i? H 5 C } = 0. 

To further bound Pr{3j > i : Xj — Xi\Xj G" 5} we now examine Pr{Xj = 
X X \X X G" R}. Let fj(x x ,Xj) be the measure on (X^Xj) given X x G" -R. Therefore 



Pr{Xj = Xi|Xi £ R} = I / fj(x x , Xj)dxjdx x . 

J xi£R c J Xj=x\ 



(27) 



Assume that Pr{X,- = XjjXi G" R} > 0, then (}2Tj) implies that there exists x x G R c 
such that 

fj(x x , Xj)d.Xj > 0. 



This is only possible if 

Pr{X x = Xj = x x \X x g" R} > 0. (28) 

Therefore, 

Pr{Xj = X X \X X = x x } = Pv{Xj = X X \X X = x x , x x G" R} 

>Pr{Xj = X X \X X = x x ,x x G R c }Pi{X x = x x \x x G" R} 
= Pr{Xj = Xi = x x \x x £R} 

(&) 

>0, (29) 

where (a) follows from the fact that x x G R c and (b) from (|28|1. By definition of 
R, (}2*9"j) implies that x x G -R. This is a contradiction since x x G i? c . Hence 



Pr{Xj = X X \X X G" i?} = Vj > 2 

and (pjj) gives 

Pr{3j > i : Xj = Xi\Xi G" S} = 0. (30) 

Therefore w.p. 1, only elements in S will appear more than once. Hence conditioned 
on ^ = 1, given 

xeB we know the labels of all the elements in X n except those 
that appear at most once w.p. 1. Therefore for n > 1 and conditioned on Cg = 1, 
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given {Ix } X £B and Z n , we can reconstruct Z n w.p. 1. Hence 

Pr{4 n) = l}H(Z n \CP = 1) <Pr{4 n) = {J^}^!^ = 1) 

<Pr{4 n) = l}H(Z n ,{I^} xeB \C^ l) = 1) 
<Vi{Cf = l}H(Z n \cf = 1) 

+ pr{4 n) = iW{4 n) Wi4 n) = i) 

<Pr{Clj n) = l}H(Z n \C { z ] = 1) 
+ Pr{C^ n) = $}H(Z n \C ( z ] = 0) 

+ Pr{4 n) = lW{4 n) Wl4 n) = l) 
+ Pr{4 n) = 0}H({lW} xeB \cP = 0) 
^(^|CS)+^({4 n) W|CS) 

<V) + if({/Wu) 

<F(Z n ) + |5| log(n + 1) Vn>l. (31) 

If l^l < oo, then set B = S. Therefore Pb = 0, = 1 w.p. 1 and equations (|23j) 
and (f3Tf give 

#(Z n ) <#(Z n ) + |S|log(n + l) Vn>l, 

#(Z n ) <#(Z n ) + |S|log(n+ 1) Vn>l. 



The finiteness of \S\ then implies 

ff(Z) = H(Z). 

To complete the proof of Theorem 0] we need to address the case where \S\ is 
infinite. Choose 7 G (0, 1) such that a = j([3 — 1) > 1. Such a 7 can be found since 
(3 > 2. Let m(n) = |"n 7 ~|. Since 

lim 7-7^ = 

and m(n) is an unbounded increasing sequence, there exists N > such that 

m(n) /3 P m(n) < 1 Vn> N. 

Construct a sequence of sets 5 n C S as follows, £> n = {sx, • • • , Sm(n)} for all n > N. 
Therefore 

P flw =Pr{Xi6 5nS»} 

00 

j=m(n)+l 
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j=m(n)+l 



oo . 



(a) 

i=m(n)+l 
poo 

< / — tcLx 



m(n) 

— <m(n)) l -P 



(3-1 



(3-1 



0>) 1 

' rT° Vri > iV, (32) 



0-1 

where (a) follows from the fact that n > N and (b) from the fact that (3 > 2. Therefore 

=^(4: ) )+^i4: ) ) 

<1 + Pr{<^ = l}H(Z n \C { £ = 1) + Pr{4; } = 0}^(Z»|C^ = 0) 
<1 + Pr {4") = l}H(Z n \C^ = 1) + nP^lf^lcg; = 0) 
<1 + Pr{4 n) = \}H{Z n \Cf n = 1) + nP^ff(Z») 

<1 + Pr{4 n) = 1}^(Z»|C7W = 1) + n 2 P Bn \ogn 

(b) 

<1 + #(Z n ) + log(n + 1) + n 2 P Bn logn 
=1 + H(Z n ) + m(n) log(n + 1) + n 2 P Bn logn 

(c) 

<l + #(Z n ) + (n 7 + l)log(n + l)+n 2 - a logn Vn > TV, (33) 

where (a) follows from the fact that Zi has an alphabet of at most i, (b) follows from 
and (c) from (|32[1. Similarly using (|31[1 we get 

H(Z n ) < 1 + H{Z n ) + (n 7 + 1) log(n + 1) + n 2 ~ a log n Vn > TV. (34) 

Since 7 G (0, 1) and a > 1, and give 

F(z) = #(z), 

completing the proof of Theorem 0] □ 
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5 Growth Rates 



Now that we have explored the relationship between the entropy rate of the original 
process and the associated pattern process, we turn our attention to possible growth 
rates for the block entropy of a pattern sequence. In other words, having looked at 
the limit, we now look at the asymptotic growth rates. 

Theorem 5 For any 5 > there exists an i.i.d. process {X{\ such that its associated 
pattern sequence satisfies 

H(Z n \Z n - 1 ) . . 

lim . "' 1 / = oo. (35) 

n-»oo (logTl) 

Note that since Z n+ i lies in an alphabet of size at most n + 1 we have, for any process, 
not even necessarily stationary, 

hmsup < 1. 

n-oo log Tl 

Theorem |S] then says that the growth rate log n is essentially, up to a factor which 
is sub-polynomial in logrz, achievable by an i.i.d. process. It should also be noted 
that the bounds on the block entropy of patterns generated by i.i.d. processed found 
in [T31 1121 EH HZj can be used to examine possible asymptotic growth rates for the 
entropy of pattern processes. An example of such an application can be found in |14j . 

We dedicate the remainder of this section to the proof of Theorem Let Xi be 
i.i.d. ~ fx, where Xi takes values in an arbitrary space A, and {Z;} be the associated 
pattern sequence. Define S = {x G A : Pr{Xi = x} > 0}. 

Claim 4 H(<&B[fx]) is increasing in B, i.e., for any B\ C B 2 C S 

H(<S> Bl [f x ])<H(<S> B2 [f x }). 

Proof of Claim ^] 

This is nothing but a data-processing inequality. Indeed, let Y ~ $B 2 [fx] an d let 



U 



Y if Y 6 Si 
x otherwise. 



Clearly U ~ ^bAJx] and U is a deterministic function of Y, thus the claim follows. 

□ 

Proposition 4 For any B C S 



H(Z n+1 \Z n )>H(® B [f x }) 



1 — \B \ exp ( — n min Pr{X = b} 
1 beB 



Proof of Proposition ^] 

Letting P x denote the distribution of X n , for any B C S, 
H(Z n+1 \Z n ) >H(Z n+1 \X n ) 
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= / H(Z n+1 \X n = x n )dP x (x n ) 

= [ H($ A{xn) if x })dPZ(x n ) 

> [ H(<S> A[xn) [f x })dPZ(x n ) 

J {x n :BCA(x")} 

>H(<S> B [f x ])Pr{BCA(X n )}, (36) 

where the last inequality follows from the monotonicity property in Claim 0] and 
A(X n ) defined to be the set of distinct elements in {Xi,...,X„}. Now, for any 
BCS, 

Pr (B % A(X n )) = Pr ( |J {b? A(X n )}\ 

XbeB J 

<J2^{b?A(X n )} 

beB 

= £(l-Pr{X = &}) B 
beB 

<\B\ ( 1 -minPriX = b] 

V beB 

<|B|exp \-n min Pr {X = b} J . (37) 

The proposition now follows by combining (|3l)|) with (J37|) . □ 
Besides being used in the proof of Theorem Proposition 0] also gives the following 
corollary which will be used in the proof of Theorem ^ 

Corollary 6 

limini H(Z n+1 \Z n ) > H @s[fx]) , 

n— >oo 

regardless of the finiteness of the right side of the inequality. 
Proof of Corollary^ 

Take a sequence {B k } of finite subsets B k C S satisfying 

KmH($ Bh [f x }) = H($ s [f x }). 

fc^oo 

Proposition 0] implies, for each k, 

limmi H(Z n+1 \Z n ) > H ($ Bk [f x ]) , (38) 

n^oo 

completing the proof by taking k — > oo on the right side of (J3~%j) . □ 



Proof of Theorem 0- 

Consider the case where {Xj} are generated i.i.d. ~ P, where P is a distribution on 
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N and Pj = P(Xi = j) is a non- increasing sequence. Letting Di = Yli=iPi^°&^ ^ 
follows by taking B — B\ — {1, . . . , 1} in Proposition HI that 



H(Z n+1 \Z n ) >H($ Bl [f x \) 

>Di [1 - /exp {-npi)\ 



1 — \Bi\ exp ( — nminPr(X = b) 



implying, by the arbitrariness of Z, 

H(Z n+1 \Z n ) > max A [1 - / exp (-npi)] • (39) 
Consider now the distribution 

( i = l 
p i = P(X 1 = z)=i ^ (40) 

I i(lni) 1+e * - 2 

for some e e (0, 1), where c(e) is the normalization constant. In this case 
c(e) . z(lni) 1+£ 



i(lni) 1+£ c(e) 



i=2 

2 ( l0g ' + ( X + £ ) MM - lo S C ( £ )) 

fc I 111 t J 



i=2 

Ec(e) / In i \ ^c(e)(l + e) / ln(lnz') \ ^ c(e) ( lnc(e) 



In 2 \z(lnz) 1+ */ In 2 Vz(lni) 1+e / ^ ln2 Vi(lni) 1 

c(e) / 1 \ y^ c(e)(l + e) / ln(lnt) \ c ( e ) / i nc (e) 

^ In~2 Vz(lnz) £ / + hi~2 ^(lni)^ 6 ) In2 ^(lnz)^ 



Observe that there exists iVj G N such that 

c(e) / 1 \ ^>c(e)(l + e) / ln(lni) \ ^-^ c(e) / lnc(e 



1 It. 9 XoH-noV I 2-^1 In 9 W/1n J 2^1 



In 2 \i(lni) e / ^ In 2 V.i( lni ) 1+ V ~^ ln2 V^lni) 1 " 

L^l 9 n 9 \ i n » V / 1 



i=2 

c(e 



21n2 V.z(lnz') £ 
1 1 



21n2 ^ z'(lnz 

i=2 v 

>4^ / + ' — i— V/ > N[ 
2\n2j x=2 x(lnxY 1 

_ c(e) / (ln^ + l)) 1 - 5 _ (ln(2)) 1 - g 

~21n2 V 1-e 1-e 
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Therefore there exists N 2 > N[ such that 



Dl > 4hi2 (ln(/ + V/ > K 

Let N 2 = (N' 2 + l n = L n (i-e)/(i+c)j j and choose > jV 2 such that 

Z n = Ln (1 ~ e)/(1+e) J > 1 Vn > N 3 . 
Combining flU and (jUJ) we get 

H(Z n+1 \Z n ) >D ln [1 - Z n exp (-7ip,J] Vn > iV 3 



(41) 



c e 



> 41n2 (ln (/ " + 1)} [1 ~ /n 6XP Vn > ^ 



> lhi2 (M (L« (1_e)/(1+e) J + O) 1 " [1 " h M/(1+E) J ex P (""Wn)] V ™ > ^3 



> 



41n2 



(ln {n^^)) l ~ e [I - t^-^i+s) exp (-np,J] Vn > iV 3 



> 



c(e) / 1 — e 



41n2 VI + £ 



l-e 



(Inn) 1 £ [1 — n exp (—npi n )] \/n > A3 



(42) 



Since 



i = l 



Pl = P(X X = 2) 



i > 2 



there exists A" 4 > N 3 such that 



Pin > 



Hl+ £ ) Vn > 



From (l4*2l we get 



#(z n+1 |z")> c(:) ^' : 



> 



> 



> 



> 



41n2 \l + e 
c(e) ( 1 — e 



41n2 \1 + e 

c(e) / 1 — s 



41n2 VI + £ 

c(e) / 1 — s 



4 ln 2 V 1 + ^ 

c(e) /l — £ 



l-e 



l-e 



l-e 



l-e 



l-e 



Inn) 1 £ [1 — nexp (—npi n )] Vn > A3 
lnn) 1_£ [1 - nexp (-n/; (1+e) )] Vn > N 4 



Inn 



,1-e 



1 — n exp ( — n ( [_ 



n 



(l-e)/(l+e)j^-( 1 + £ ) 



Inn) 1 "' 1-nexp -n(n (1 ~ £)/(1+£) ) 



) (l-e)/(l+e)\-( 1 + £ ) 



Inn) 1 £ [1 — nexp(— n 6 )] Vn > A^. 



41n2 \1 + £ 

Finally, there exists A" 5 > A4 such that N 5 >2 and 

1 — n exp (— n £ ) > 1 — e Vn > A^. 



Vn > A" 4 
Vn > A" 4 
(43) 
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From (pIHj) we get 



H(Z n+1 \Z n ) > 



2 (l+e) ( lnra ) 1_£ t 1 - ^ ex P(-^ £ )] v ™ > ^4 



> 



> 



> 




Thus (}35|) is satisfied under the distribution in (J4*U|) with any e G (0,min{<5, 1}). □ 

6 Conclusion 

We have characterized the relationship between the entropy rate of a source and that 
of its pattern process for i.i.d., discrete Markov, discrete stationary ergodic, and a 
broad family of uncountable alphabet stationary ergodic processes. Besides deter- 
mining the fundamental compression limits for a pattern sequence, the relationship 
between pattern and process entropy rate helps to quantify how much of the total 
information contained in the original stochastic process is encompassed in its pattern 
sequence. For the case where the pattern entropy rate is infinite, we characterized 
achievable growth rates for the block entropy of a pattern sequence. 

A Proof of Theorem U 

If | SI = 0, then Pr{3 i ^ j : X t = Xj} = 0. Therefore H(Z n ) = for all n. This 
implies that H(Z) = which agrees with Theorem ^ Hence we just need to prove 
Theorem^ for the case where \S\ > 0. 

Note that Corollary H and the fact that regardless of the finiteness of H(X\), 
H{Z\) < oo and if (Z n |Z n_1 ) < oo for all n gives 



lim inf 



H(Z n ) 



> H{X X ). 



(44) 



71- 



OC 



n 
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For the reverse inequality, look at 



H(Z n ) («) , H(X n ) 
lim sup < hm sup 

n-^oo Tl n^oo Tl 

= H{X X ), (45) 

where (a) comes from the fact that given X n we can reconstruct Z n w.p. 1 and (b) 
from the fact that {X{\ is an i.i.d. process. Combining (|44jl and (j43j) and noting that 
{Xj} is an i.i.d. process completes the proof of Theorem Q □ 
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